Introduction
This document provides a comprehensive guide to using Source Links within Reactor. Source Links are a powerful feature that enables users to combine data from two or more data sources, allowing for unified mapping and analysis. This document explains the purpose of Source Links how they function, and provides step-by-step instructions for their configuration and management within Reactor.
What are Source Links?
Source Links enable the combination of data from multiple sources, facilitating the mapping of data from those sources to a single semantic model and destination schema. This capability is essential for integrating data from disparate systems and creating a consolidated view of information.
Use Cases for Source Links
Source Links are particularly useful in scenarios where data relevant to a single entity is spread across multiple sources. Common use cases include:
- Enriching Order Data: Appending Cost of Goods Sold (COGS) data from an ERP system to order data from an e-commerce platform.
- Integrating Fulfillment Information: Combining fulfillment data from an Order Management System (OMS) with order data from an e-commerce platform.
- Consolidating Product Information: Merging product data from a Product Information Management (PIM) system or ERP system with product data from an e-commerce platform.
Benefits of Using Source Links
Traditionally, these data linkage operations were performed within the data warehouse using SQL. However, this approach has several drawbacks:
- Increased Data Warehouse Costs: Ingesting unnecessary columns from source tables increases storage and processing costs within the data warehouse.
- Limited Accessibility of Linkage Logic: Embedding the linkage logic in SQL queries makes it difficult for non-technical users to understand and validate data transformations.
Reactor's Source Links feature addresses these issues by enabling data linkage within the platform before data is loaded into the data warehouse.
How Source Links Work
Illustrative Example: COGS Data Integration
To illustrate the functionality of Source Links, consider the COGS data integration example mentioned earlier:
-
Legacy Approach (Data Warehouse-Based): In a traditional setup, all data from the e-commerce platform and the ERP system would be ingested into the data warehouse, and the join operation would be performed using SQL.
-
Reactor Approach (Source Links): With Reactor, data is joined after initial data collection and atomization but before the mapping stage. This allows for more efficient data processing and reduces the amount of data the data warehouse processes.
Prerequisites for Using Source Links
Before creating Source Links, ensure the following prerequisites are met for all sources involved in the linkage:
- Data Source Connections: Establish connections to all necessary data sources within Reactor. (Refer to the "Connect all data sources" documentation for instructions.)
- Data Standard Deployment: Deploy a Data Standard for each data source to ensure data consistency and quality. (Refer to the "Deploy a Data Standard" documentation for instructions.)
Linking Sources: Step-by-Step Guide
Creating a Source Link
To create a Source Link:
-
Navigate to the Source Links Page:
- In Reactor, select "Source Links" from the left navigation bar.
- In Reactor, select "Source Links" from the left navigation bar.
-
Initiate Link Creation:
- The Source Links page displays a list of existing links.
- Click the "Add Link +" button below the list to open the "Create Source Link" dialog.
-
Select Sources to Link:
- The "Create Source Link" dialog contains a wizard walk-through and two drop-down menus over several steps.
- Use these menus to select the two sources you want to link. Ensure that both sources have a deployed Data Standard.
- Select the first source in the first screen dialog and the second source in the next screen dialog. Here is the first screen you will see in the wizard steps.
-
Configure Link Fields:
- The last screen in the wizard configures the fields for each source. See screenshot below.
- These fields are the ones whose values should match between records from the two sources.
- For example, when linking Product Cost data to Product Item Master data, you might select the
"PROUDUCTId"field from Product Cost and the"SKU"field from Product Item Master.
-
(Optional) Add Additional Field Pairs:
- If more than one set of fields needs to be used to determine how records from the same two sources should be linked, click the "+" button within an existing transform.
- This is common when linking COGS or shipping cost data to order data, as orders can have multiple line items or shipments.
- For example, when linking a COGS file to order data, you might need to match both an
order identifierand aproduct identifier. - When linking a shipping cost file, you might need to match an
order identifierand ashipment identifieror tracking number.
- For example, when linking a COGS file to order data, you might need to match both an
-
(Optional) Apply Transformations:
- If a field needs to be transformed to align with the corresponding field in the other source, you can use the "Optional Expression" dialog next to the field selection menus.
- Transformations can only be applied to the field selected from the right-hand source.
- See the "Using Transformation Functions in a Link Configuration" section below for details on supported transformations.
-
Save the Source Link:
- Once you have configured the Source Link, click the "Save" button, typically located at the bottom-right of the dialog.
- The new link will be added to the list of connections on the Source Links page.
-
Repeat for Additional Links (If Necessary):
- If you need to configure more Source Links, repeat steps 1 through 7 for each pair of sources.
Editing a Source Link
To edit an existing Source Link:
-
Select the Link to Edit:
- On the Source Links page, click the down arrow icon for the connection in the connections list corresponding to the link you want to modify.
- This action will open the "Transformation" dialog, which is populated with the link's current configuration.
- Then click the Edit button next to the down arrow as shown in the screenshot.
-
Make Necessary Changes:
- Modify the link configuration within the dialog as needed.
- After making changes, click the "Save" button at the bottom-right of the dialog.
- Important Note: If you change the source selection in either of the top menu bars, you must re-add and reconfigure all linked fields.
-
Deploy the Updated Link:
- Click the "Deploy" button at the top of the Source Links page to deploy the updated link configuration.
Using Transformation Functions in a Link Configuration
Reactor allows users to apply transformation functions to linked fields. This is useful when the values in the join fields are not identical across sources but can be made to match through transformation.
The following transformation functions are supported:
splitprefixsuffixlowercaseuppercasestring_to_intint_to_stringregex_replace
To use a transformation function:
- Enter the expression in the "Optional Expression" text box.
- Multiple transformations can be chained together, separated by commas.
- For example:
int_to_string, suffix('x')(This expression converts an integer to a string and appends "x" to the end).
- For example:
Refer to the Source Links Transformation Reference documentation for detailed information on each transformation function.
Linking More Than Two Sources
To link three or more sources, you need to create multiple Source Link configurations.
Example: Linking Three Sources
Consider a scenario where you need to join data from three sources:
- Source A: Order data with basic line item information (SKU, Quantity, Price).
- Source B: Product data with additional product details (UPC, Title).
- Source C: A product source that contains both SKU and UPC fields.
In this case, you want to map line item data from Source A and product titles from Source B to an orders model. Source C is needed to translate the SKUs in Source A to the UPCs in Source B.
You would configure the following Source Links:
-
Connection 1:
- Source A: SKU
- Source C: SKU
-
Connection 2:
- Source C: UPC
- Source B: UPC